Feature Reduction of Hepatocellular Carcinoma using Harris Hawks Optimization and Adaptive Ensemble Learning

Authors: Yuvraj Singh, Harshali Patil

DOI Link: https://doi.org/10.22214/ijraset.2026.82135

Abstract

Hepatocellular carcinoma (HCC) is one of the leading causes of cancer-related mortality in the whole world. Which makes early prediction and assessment of the outcome critically important. Although machine learning techniques have shown promise in clinical prediction tasks many existing approaches rely on large feature sets which can affect interpretability and also increase computational cost. The work presented shows a hybrid framework that focuses on identifying a compact and an informative subset of clinical features while also maintaining reliable predictive performance. The approach combines statistical filtering with Harris Hawks Optimization (HHO) to refine the feature space. In addition, an adaptive ensemble strategy is employed, where Bagging and Boosting models are evaluated and the better-performing model is selected based on F1-score. The model is evaluated using stratified 5-fold cross-validation. The results show that the proposed method reduces the feature space by approximately 62% while achieving an average accuracy of 73.33 ± 5.42% and an F1-score of 78.20 ± 5.37%. Furthermore, the consistency of selected features across folds indicates stable and meaningful feature selection. Overall, the framework demonstrates a balance between efficiency, interpretability, and predictive performance, making it suitable for clinical decision support applications.

Introduction

The text describes a machine learning approach for predicting hepatocellular carcinoma (HCC) outcomes, a major form of liver cancer, where accurate prediction is crucial for improving clinical decision-making.

It explains that traditional methods struggle with complex medical datasets, while machine learning models often become too complex and less interpretable when using many features. To address this, the study proposes a hybrid feature reduction and adaptive ensemble framework.

The proposed method combines:

Statistical filtering (correlation analysis and Fisher score) to remove irrelevant or redundant features
Harris Hawks Optimization (HHO) to select the most informative feature subset
An adaptive ensemble system that dynamically chooses between Bagging and Boosting models based on performance (F1-score) for each cross-validation fold

Key contributions include improved feature selection, adaptive model selection, feature stability analysis, and robust evaluation using stratified cross-validation.

The literature review shows that existing HCC prediction models either use too many features, lack interpretability, rely on high computational cost, or fail to capture complex patterns effectively.

In experiments, the framework significantly reduces features from 48 to 18 while maintaining predictive performance. The system improves efficiency and interpretability by simplifying the dataset without losing important clinical information.

Conclusion

This study presents a hybrid machine learning framework for HCC prediction that combines feature reduction with adaptive ensemble learning. The approach achieves a good balance between efficiency and predictive performance while maintaining interpretability. The findings indicate that reducing feature complexity does not necessarily lead to a loss in performance. Instead, a carefully selected subset of features can provide meaningful insights for clinical applications.

References

[1] L. R. Lin, Y. K. Liu, M. Gao, and A. Rezaeipanah, “Improving hepatocellular carcinoma diagnosis using an ensemble classification approach based on Harris Hawks Optimization,” Heliyon, vol. 10, no. 1, p. e23497, 2024. [2] X. Lin et al., “High-precision hepatocellular carcinoma diagnosis with Random Forest classifier,” Journal of Physics: Conference Series, vol. 2157, 2024. [3] S. Wang et al., “Evaluation of statistical filtering and SVM for diagnostic accuracy in small-scale HCC datasets,” Medical Engineering & Physics, vol. 115, 2024. [4] G. Mostafa, “Transformer-based deep learning and recursive feature elimination for advanced HCC detection,” Scientific Reports, 2025, in press. [5] G. K. Patro et al., “Hybrid ensemble architectures and genetic algorithm optimization for enhanced liver cancer classification,” Biomedical Signal Processing and Control, vol. 75, 2024. [6] K. Zhang et al., “Integration of transfer learning and artificial neural networks for large-scale clinical HCC screening,” Expert Systems with Applications, vol. 210, 2024. [7] T. Nguyen et al., “Attention-based interpretability in machine learning models for hepatocellular carcinoma survival analysis,” Nature Communications, 2025, forthcoming. [8] J. S. Almeida et al., “Comparison of metaheuristic feature selection and CatBoost for high-dimensional clinical data in liver cancer,” Computers in Biology and Medicine, vol. 165, 2024. [9] R. Kumar et al., “A stacking ensemble approach for multi-omics data fusion in hepatocellular carcinoma prognosis,” Cell Reports Methods, vol. 5, no. 1, 2025. [10] A. Mizouri, “Fine-tuning deep convolutional neural networks for automated detection of liver tumours,” International Journal of Oncology, vol. 62, 2024. [11] Baishideng Publishing Group, “Explainable artificial intelligence and ensemble learning for hepatocellular carcinoma classification: State of the art, performance, and clinical implications,” World Journal of Hepatology, vol. 17, no. 11, 2025. [12] P. K. Mondal and H. Byeon, “Classification of liver disease using conventional tree-based machine learning approaches with feature prioritization using a heuristic algorithm,” International Journal of Advanced Computer Science, vol. 15, no. 4, 2024. [13] G. Mostafa et al., “Feature reduction for hepatocellular carcinoma prediction using machine learning algorithms,” Journal of Big Data, vol. 11, p. 88, 2024. [14] MathWorks, “MATLAB (R2023b) and Statistics and Machine Learning Toolbox,” Natick, MA, USA: The MathWorks, Inc., 2023. [15] Leo Breiman, “Random forests,” Machine Learning, vol. 45, no. 1, pp. 5–32, 2001. [16] Corinna Cortes and Vladimir Vapnik, “Support-vector networks,” Machine Learning, vol. 20, no. 3, pp. 273–297, 1995. [17] Yann LeCun, Y. Bengio, and G. Hinton, “Deep learning,” Nature, vol. 521, no. 7553, pp. 436–444, 2015. [18] European Association for the Study of the Liver, “EASL Clinical Practice Guidelines: Management of hepatocellular carcinoma,” Journal of Hepatology, vol. 69, no. 1, pp. 182–236, 2018. [19] D. Dua and C. Graff, “UCI Machine Learning Repository,” University of California, Irvine, 2017. [20] S. K. Mohammed, R. Al-Maqaleh, and A. M. Al-Shehari, “Machine learning techniques for liver disease diagnosis: A review,” IEEE Access, vol. 8, pp. 174–189, 2020.

Copyright

Copyright © 2026 Yuvraj Singh, Harshali Patil. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Download Paper

Paper Id : IJRASET82135

Publish Date : 2026-05-07

ISSN : 2321-9653

Publisher Name : IJRASET

DOI Link : Click Here